Goto

Collaborating Authors

 prevent catastrophic


Appendix A Deferred proofs

Neural Information Processing Systems

In this section, we show the proofs omitted from Sec. 3 and Sec. 4. A.1 Proof of Lemma 1 We state again Lemma 1 from Sec. 3 and present the proof. First, note that due to the Jensen's inequality, we can have a convenient upper bound which is For this purpose, in Figure 1 we plot: 15 Figure 9: Visualization of the key quantities involved in Lemma 2. We list detailed evaluation and training details below. The single-layer CNN that we study in Sec. 4 has 4 convolutional filters, each of them of size We describe here supporting experiments and visualizations related to Sec. 3 and Sec. 4. C.1 Quality of the linear approximation for ReLU networks The phenomenon is even more pronounced for FGSM perturbations as the linearization error is much higher there. C.2 Catastrophic overfitting in a single-layer CNN We describe here complementary figures to Sec. 4 which are related to the single-layer CNN. Laplace filter which is very sensitive to noise.


Appendix A Deferred proofs

Neural Information Processing Systems

In this section, we show the proofs omitted from Sec. 3 and Sec. 4. A.1 Proof of Lemma 1 We state again Lemma 1 from Sec. 3 and present the proof. First, note that due to the Jensen's inequality, we can have a convenient upper bound which is For this purpose, in Figure 1 we plot: 15 Figure 9: Visualization of the key quantities involved in Lemma 2. We list detailed evaluation and training details below. The single-layer CNN that we study in Sec. 4 has 4 convolutional filters, each of them of size We describe here supporting experiments and visualizations related to Sec. 3 and Sec. 4. C.1 Quality of the linear approximation for ReLU networks The phenomenon is even more pronounced for FGSM perturbations as the linearization error is much higher there. C.2 Catastrophic overfitting in a single-layer CNN We describe here complementary figures to Sec. 4 which are related to the single-layer CNN. Laplace filter which is very sensitive to noise.


Mimicking human sleep as a way to prevent catastrophic forgetting in AI systems

#artificialintelligence

A trio of researchers from the University of California, working with a colleague from the Institute of Computer Science of the Czech Academy of Sciences, has found that it is possible to prevent catastrophic forgetting in AI systems by having such systems mimic human REM sleep. In their paper published in PLOS Computational Biology, Ryan Golden, Jean Erik Delanois, Maxim Bazhenov and Pavel Sanda describe teaching artificial intelligence systems to remember what was learned from a beginning task when working on a second task. Prior research has shown that people experience something called consolidation of memory during REM sleep. It is a process whereby things that were experienced recently are moved to long term memory to make room for new experiences. Without such a process, the brain undergoes catastrophic forgetting, where memories of recent things are not retained.

  Country: North America > United States > California (0.27)
  Genre: Research Report (0.39)
  Industry: Health & Medicine > Therapeutic Area > Sleep (0.84)

Artificial Neural Networks Learn Better When They Spend Time Not Learning at All - Neuroscience News

#artificialintelligence

Summary: "Off-line" periods during AI training mitigated "catastrophic forgetting" in artificial neural networks, mimicking the learning benefits sleep provides in the human brain. Depending on age, humans need 7 to 13 hours of sleep per 24 hours. During this time, a lot happens: Heart rate, breathing and metabolism ebb and flow; hormone levels adjust; the body relaxes. "The brain is very busy when we sleep, repeating what we have learned during the day," said Maxim Bazhenov, PhD, professor of medicine and a sleep researcher at University of California San Diego School of Medicine. "Sleep helps reorganize memories and presents them in the most efficient way."


Understanding and Improving Fast Adversarial Training

Andriushchenko, Maksym, Flammarion, Nicolas

arXiv.org Machine Learning

A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that $\ell_\infty$-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger $\ell_\infty$-perturbations and reduce the gap to multi-step adversarial training. The code of our experiments is available at https://github.com/tml-epfl/understanding-fast-adv-training.